Publications

The list below is a local compilation of my publication repository. Other external compilations can be found in:

Types of publications:

Editorials
2023
[4]N. Roma, B. Zatt, Special Issue on “SBCCI’2022”, IEEE Design & Test, IEEE, vol. 5, no. 5, pp. 5-6, 2023. [bibtex] [pdf] [doi]
2021
[3]L. Sousa, N. Roma and P. Tomás, Euro-Par 2021: Proceedings of the 27th International Conference on Parallel and Distributed Computing, Lecture Notes in Computer Science (LNCS), Springer, no. 12820, 2021. [bibtex] [pdf] [doi]
2017
[2]L. Sousa, N. Roma, Special Issue on Real-Time Energy-Aware Circuits and Systems for HEVC and for its 3D and SVC Extensions, Journal of Real-Time Image Processing, Springer, vol. 1, no. 1, pp. 1-3, 2017. [bibtex] [pdf] [doi]
2016
[1]N. Roma, J. Nunez-Yanez, Special Issue on Energy Efficient Architectures for Embedded Systems, EURASIP Journal on Embedded Systems, Springer, vol. 20, pp. 1-2, 2016. [bibtex] [pdf] [doi]
Book Chapters
2017
[4]N. Roma, A. Rodrigues and L. Sousa, "Parallel Programming Framework for H.264/AVC Video Encoding in Multicore Systems", in Programming multi‐core and many‐core computing systems, S. Pllana, F. Xhafa, Eds., John Wiley & Sons, Ltd, 2017, pp. 281-300. [bibtex] [pdf] [doi]
2004
[3]N. Roma, T. Dias and L. Sousa, "Customisable Core-Based Architectures for Real-Time Motion Estimation on FPGAs", in New Algorithms, Architectures, and Applications for Reconfigurable Computing, P. Y. K. Cheung, G. A. Constantinides, J. T. d. Sousa, Eds., Springer-Verlag, 2004, pp. 55–66. [bibtex] [pdf] [doi]
2002
[2]N. Roma, L. Sousa, "A New Efficient VLSI Architecture for Full Search Block Matching Motion Estimation", in SOC Design Methodologies, M. Robert et al., Eds., Kluwer Academic Pulishers, 2002, pp. 253-264. [bibtex] [pdf] [doi]
[1]N. Roma, J. Santos-Victor and J. Tomé, "A Comparative Analysis of Cross-Correlation Matching Algorithms Using a Pyramidal Resolution Approach", in Empirical Evaluation Methods in Computer Vision, H. I. Christensen, P. J. Phillips, Eds., World Scientific Press, 2002, pp. 117–142. [bibtex] [pdf] [doi]
International Journal Articles
2024
[46]J. Vieira, N. Roma, G. Falcao, P. Tomás, "NDPmulator: Enabling Full-System Simulation for Near-Data Accelerators from Caches to DRAM", IEEE Access, vol. 12, January 2024, pp. 10349–10365. [bibtex] [pdf] [doi]
[45]J. Vieira, N. Roma, G. Falcao, P. Tomás, "gem5-accel: A Pre-RTL Simulation Toolchain for Accelerator Architecture Validation", IEEE Computer Architecture Letters, vol. 23, no. 1, Jan.-June 2024, pp. 1-4. [bibtex] [pdf] [doi]
2023
[44]A. Saha, N. Roma, M. Chavarrías, T. Dias, F. Pescador, V. Aranda, "GPU-based Parallelisation of a Versatile Video Coding Adaptive Loop Filter in Resource-Constrained Heterogeneous Embedded Platform", Journal of Real-Time Image Processing, no. 3, mar 2023, pp. 1–13. [bibtex] [pdf] [doi]
2022
[43]N. Neves, J. M. Domingos, N. Roma, P. Tomás, G. Falcão, "Compiling for Vector Extensions with Stream-based Specialization", IEEE Micro, no. 5, sep 2022, pp. 49–58. [bibtex] [pdf] [doi]
[42]F. Mendes, P. Tomás and N. Roma, "Decoupling GPGPU voltage-frequency scaling for deep-learning applications", Journal of Parallel and Distributed Computing, jul 2022, pp. 32–51. [bibtex] [pdf] [doi]
[41]L. Crespo, P. Tomás, N. Roma, N. Neves, "Unified Posit/IEEE-754 Vector MAC Unit for Transprecision Computing", IEEE Transactions on Circuits and Systems II: Express Briefs, no. 5, may 2022, pp. 2478-2482. [bibtex] [pdf] [doi]
2021
[40]N. Neves, P. Tomás and N. Roma, "A Reconfigurable Posit Tensor Unit with Variable-Precision Arithmetic and Automatic Data Streaming", Journal of Signal Processing Systems, vol. 93, no. 12, dec 2021, pp. 1365-1385. [bibtex] [pdf] [doi]
[39]J. Vieira, N. Roma, G. Falcao, P. Tomás, "A Compute Cache System for Signal Processing Applications", Journal of Signal Processing Systems, vol. 93, no. 10, oct 2021, pp. 1173-1186. [bibtex] [pdf] [doi]
[38]R. Porto, M. Perleberg, V. Afonso, B. Zatt, N. Roma, L. Agostini, M. Porto, "Fast and energy-efficient approximate motion estimation architecture for real-time 4K UHD processing", Journal of Real-Time Image Processing, vol. 18, jun 2021, pp. 723-737. [bibtex] [pdf] [doi]
[37]N. Neves, P. Tomás and N. Roma, "Compiler-Assisted Data Streaming for Regular Code Structures", IEEE Transactions on Computers, vol. 70, no. 3, mar 2021, pp. 483–494. [bibtex] [pdf] [doi]
2020
[36]R. Porto, M. Corrêa, J. Goebel, B. Zatt, N. Roma, L. Agostini, M. Porto, "UHD 8K energy-quality scalable HEVC intra-prediction SAD unit hardware using optimized and configurable imprecise adders", Journal of Real-Time Image Processing, vol. 17, no. 5, oct 2020, pp. 1685–1701. [bibtex] [pdf] [doi]
2019
[35]J. Guerreiro, A. Ilic, N. Roma, P. Tomás, "GPU Static Modeling Using PTX and Deep Structured Learning", IEEE Access, vol. 7, dec 2019, pp. 159150–159161. [bibtex] [pdf] [doi]
[34]J. Guerreiro, A. Ilic, N. Roma, P. Tomás, "Modeling and Decoupling the GPU Power Consumption for Cross-Domain DVFS", IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 11, nov 2019, pp. 2494–2506. [bibtex] [pdf] [doi]
[33]R. Marques, L. Russo and N. Roma, "Flying tourist problem: Flight time and cost minimization in complex routes", Expert Systems with Applications, vol. 130, sep 2019, pp. 172–187. [bibtex] [pdf] [doi]
[32]J. Guerreiro, A. Ilic, N. Roma, P. Tomás, "DVFS-aware application classification to improve GPGPUs energy efficiency", Parallel Computing, vol. 83, apr 2019, pp. 93–117. [bibtex] [pdf] [doi]
2018
[31]N. Neves, P. Tomás and N. Roma, "Stream data prefetcher for the GPU memory interface", Journal of Supercomputing, vol. 74, no. 6, jun 2018, pp. 2314–2328. [bibtex] [pdf] [doi]
[30]B. Wang, D. F. d. Souza, M. A. Mesa, C. C. Chi, B. H. H. Juurlink, A. Ilic, N. Roma, L. Sousa, "Highly parallel HEVC decoding for heterogeneous systems with CPU and GPU", Signal Processing: Image Communication, vol. 62, mar 2018, pp. 93–105. [bibtex] [pdf] [doi]
2017
[29]B. Wang, D. F. d. Souza, M. A. Mesa, C. C. Chi, B. H. H. Juurlink, A. Ilic, N. Roma, L. Sousa, "GPU Parallelization of HEVC In-Loop Filters", International Journal of Parallel Programming, vol. 45, no. 6, dec 2017, pp. 1515–1535. [bibtex] [pdf] [doi]
[28]J. Feldt, S. Miranda, F. Pratas, N. Roma, P. Tomás, R. A. Mata, "Optimization and benchmarking of a perturbative Metropolis Monte Carlo quantum mechanics/molecular mechanics program", The Journal of Chemical Physics, vol. 147, no. 24, dec 2017, pp. 244105. [bibtex] [pdf] [doi]
[27]S. Miranda, J. Feldt, F. Pratas, R. A. Mata, N. Roma, P. Tomás, "Efficient parallelization of perturbative Monte Carlo QM/MM simulations in heterogeneous platforms", International Journal of High Performance Computing Applications, vol. 31, no. 6, nov 2017, pp. 499–516. [bibtex] [pdf] [doi]
[26]N. Neves, P. Tomás and N. Roma, "Adaptive In-Cache Streaming for Efficient Data Management", IEEE Transactions on Very Large Scale Integration Systems, vol. 25, no. 7, jul 2017, pp. 2130–2143. [bibtex] [pdf] [doi]
[25]D. F. d. Souza, A. Ilic, N. Roma, L. Sousa, "GHEVC: An Efficient HEVC Decoder for Graphics Processing Units", IEEE Transactions on Multimedia, vol. 19, no. 3, mar 2017, pp. 459–474. [bibtex] [pdf] [doi]
2016
[24]D. Nogueira, P. Tomás and N. Roma, "BowMapCL: Burrows-Wheeler Mapping on Multiple Heterogeneous Accelerators", Transactions on Computational Biology and Bioinformatics, vol. 13, no. 5, sep 2016, pp. 926–938. [bibtex] [pdf] [doi]
[23]D. F. d. Souza, A. Ilic, N. Roma, L. Sousa, "GPU-assisted HEVC intra decoder", Journal of Real-Time Image Processing, vol. 12, no. 2, aug 2016, pp. 531–547. [bibtex] [pdf] [doi]
[22]N. Neves, R. Neves, N. Horta, P. Tomás, N. Roma, "Multi-objective kernel mapping and scheduling for morphable many-core architectures", Expert Systems With Applications, vol. 45, mar 2016, pp. 385–399. [bibtex] [pdf] [doi]
[21]A. Ilic, S. Momcilovic, N. Roma, L. Sousa, "Adaptive Scheduling Framework for Real-Time Video Encoding on Heterogeneous Systems", IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, no. 3, mar 2016, pp. 597–611. [bibtex] [pdf] [doi]
[20]S. Momcilovic, N. Roma and L. Sousa, "Exploiting task and data parallelism for advanced video coding on hybrid CPU + GPU platforms", Journal of Real-Time Image Processing, vol. 11, no. 3, mar 2016, pp. 571–587. [bibtex] [pdf] [doi]
2015
[19]N. Neves, N. Sebastião, D. M. d. Matos, P. Tomás, P. Flores, N. Roma, "Multicore SIMD ASIP for Next-Generation Sequencing and Alignment Biochip Platforms", IEEE Transactions on Very Large Scale Integration Systems, vol. 23, no. 7, jul 2015, pp. 1287–1300. [bibtex] [pdf] [doi]
[18]N. Sebastião, G. Encarnação and N. Roma, "Implementation and performance analysis of efficient index structures for DNA search algorithms in parallel platforms", Journal on Concurrency and Computation: Practice & Experience, vol. 27, no. 9, jun 2015, pp. 2351–2368. [bibtex] [pdf] [doi]
[17]T. Ferreirinha, R. Nunes, L. Azevedo, A. Soares, F. Pratas, P. Tomás, N. Roma, "Acceleration of stochastic seismic inversion in OpenCL-based heterogeneous platforms", Computers & Geosciences, vol. 78, may 2015, pp. 26–36. [bibtex] [pdf] [doi]
[16]N. Neves, H. Mendes, R. Chaves, P. Tomás, N. Roma, "Morphable hundred-core heterogeneous architecture for energy-aware computation", IET Computers & Digital Techniques, vol. 9, no. 1, jan 2015, pp. 49–62. [bibtex] [pdf] [doi]
2014
[15]T. Dias, N. Roma and L. Sousa, "Unified transform architecture for AVC, AVS, VC-1 and HEVC high-performance codecs", EURASIP Journal on Advances in Signal Processing, vol. 2014, jul 2014, pp. 108. [bibtex] [pdf] [doi]
[14]M. Ferreira, N. Roma and L. Russo, "Cache-Oblivious parallel SIMD Viterbi decoding for sequence search in HMMER", BMC Bioinformatics, vol. 15, may 2014, pp. 165. [bibtex] [pdf] [doi]
[13]S. Momcilovic, A. Ilic, N. Roma, L. Sousa, "Dynamic Load Balancing for Real-Time Video Encoding on Heterogeneous CPU+GPU Systems", IEEE Transactions on Multimedia, vol. 16, no. 1, jan 2014, pp. 108–121. [bibtex] [pdf] [doi]
2013
[12]N. Sebastião, N. Roma and P. Flores, "Configurable and scalable class of high performance hardware accelerators for simultaneous DNA sequence alignment", Journal on Concurrency and Computation: Practice & Experience, vol. 25, no. 10, jul 2013, pp. 1319–1339. [bibtex] [pdf] [doi]
[11]T. Dias, S. López, N. Roma, L. Sousa, "Scalable Unified Transform Architecture for Advanced Video Coding Embedded Systems", International Journal of Parallel Programming, vol. 41, no. 2, apr 2013, pp. 236–260. [bibtex] [pdf] [doi]
2012
[10]N. Sebastião, N. Roma and P. Flores, "Integrated Hardware Architecture for Efficient Computation of the n-Best Bio-Sequence Local Alignments in Embedded Platforms", IEEE Transactions on Very Large Scale Integration Systems, vol. 20, no. 7, jul 2012, pp. 1262–1275. [bibtex] [pdf] [doi]
[9]N. Sebastião, N. Roma and P. Flores, "Hardware accelerator architecture for simultaneous short-read DNA sequences alignment with enhanced traceback phase", Microprocessors and Microsystems: Embedded Hardware Design (MICPRO), vol. 36, no. 2, mar 2012, pp. 96–109. [bibtex] [pdf] [doi]
2011
[8]N. Roma, L. Sousa, "A tutorial overview on the properties of the discrete cosine transform for encoded image and video processing", Signal Processing, vol. 91, no. 11, nov 2011, pp. 2443–2464. [bibtex] [pdf] [doi]
[7]T. Dias, S. López, N. Roma, L. Sousa, "A flexible architecture for the computation of direct and inverse transforms in H.264/AVC video codecs", IEEE Transactions on Consumer Electronics, vol. 57, no. 2, may 2011, pp. 936–944. [bibtex] [pdf] [doi]
2007
[6]N. Roma, L. Sousa, "Efficient hybrid DCT-domain algorithm for any arbitrary integer re-size video downscaling", EURASIP Journal on Advances in Signal Processing, vol. 2007, no. 57291, sep 2007, pp. 1–16. [bibtex] [pdf] [doi]
[5]T. Dias, S. Momcilovic, N. Roma, L. Sousa, "Adaptive Motion Estimator for Autonomous Video Devices", EURASIP Journal on Embedded Systems, special issue on Embedded Systems for Portable and Mobile Video Platforms, vol. 2007, no. 57234, may 2007, pp. 1–10. [bibtex] [pdf] [doi]
[4]T. Dias, N. Roma, L. Sousa, M. Ribeiro, "Reconfigurable architectures and processors for real-time video motion estimation", Journal of Real-Time Image Processing, vol. 2, no. 4, dez 2007, pp. 191–205. [bibtex] [pdf] [doi]
2003
[3]N. Roma, L. Sousa, "Fast transcoding architectures for insertion of non-regular shaped objects in the compressed DCT-domain", Signal Processing: Image Communication, vol. 18, no. 8, sep 2003, pp. 659–683. [bibtex] [pdf] [doi]
[2]N. Roma, L. Sousa, "Automatic Synthesis of Motion Estimation Processors Based on a New Class of Hardware Architectures", Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology, vol. 34, no. 3, jul 2003, pp. 277–290. [bibtex] [pdf] [doi]
2002
[1]N. Roma, L. Sousa, "Efficient and configurable full-search block-matching processors", IEEE Transactions on Circuits and Systems for Video Technology, vol. 12, no. 12, dez 2002, pp. 1160–1167. [bibtex] [pdf] [doi]
International Conference Papers
2024
[85]I. Storch, N. Roma, D. Palomino, S. Bampi, "Alternative Reference Samples to Improve Coding Efficiency for Parallel Intra Prediction Solutions", in Latin America Symposium on Circuits and Systems (LASCAS), IEEE, feb, 2024. [bibtex] [url]
2023
[84]L. Crespo, P. Tomás, N. Roma, N. Neves, "Trading Performance, Power, and Area on Low-Precision Posit MAC Units for CNN Training", in IEEE International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), IEEE, oct, 2023, pp. 46-56. [bibtex] [pdf] [doi]
[83]A. B. Fernandes, N. Neves, L. Crespo, P. Tomás, N. Roma, G. Falcao, "A functional validation framework for the Unlimited Vector Extension", in The 1st Workshop on Computer Architecture Modeling and Simulation (CAMS), (no proceedings published), oct, 2023. [bibtex] [pdf]
[82]I. Storch, N. Roma, D. Palomino, S. Bampi, "GPU Acceleration of MIP Intra Prediction in VVC", in European Signal Processing Conference (EUSIPCO), EURASIP, sep, 2023. [bibtex] [pdf] [doi]
[81]J. M. Domingos, T. Rocha, N. Neves, N. Roma, P. Tomás, L. Sousa, "Supporting RISC-V Performance Counters Through Performance Analysis Tools for Linux", in IEEE International Conference on Application-specific Systems, Architectures, and Processors (ASAP), IEEE, jul, 2023. [bibtex] [pdf] [doi]
[80]T. Malcata, N. Sebastião, T. M. Dias, N. Roma, "Neural Network Predictor for Fast Channel Change on DVB Set-Top-Boxes", in Workshop on Design and Architectures for Signal and Image Processing (DASIP), Springer, jan, 2023, pp. 40–52. [bibtex] [pdf] [doi]
2022
[79]J. Vieira, N. Roma, G. Falcao, P. Tomás, "gem5-ndp: Near-Data Processing Architecture Simulation From Low Level Caches to DRAM", in IEEE International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), IEEE, nov, 2022, pp. 41–50. [bibtex] [pdf] [doi]
[78]M. Rosado, S. Mallios, P. Tomás, N. Roma, A. David, "Early prototyping and testing of CERN LHC CMS high-granularity calorimeter slow-control system", in International Workshop on Rapid System Prototyping (RSP), IEEE, oct, 2022. [bibtex] [pdf] [doi]
[77]M. M. Correa, N. Roma, D. M. V. Palomimo, G. R. Correa, L. Agostini, "Mode-Adaptive Subsampling of SAD/SSE Operations for Intra Prediction Cost Reduction", in IEEE International Symposium on Circuits & Systems (ISCAS), IEEE, may, 2022. [bibtex] [pdf] [doi]
2021
[76]G. Raposo, P. Tomás and N. Roma, "PositNN: Training Deep Neural Networks with Mixed Low-Precision Posit", in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), jun, 2021. [bibtex] [pdf] [doi]
[75]J. Mário, N. Neves, N. Roma, P. Tomás, "Unlimited Vector Extension with Data Streaming Support", in International Symposium on Computer Architecture (ISCA), jun, 2021. [bibtex] [pdf] [doi]
[74]M. Pinho, P. Tomás and N. Roma, "Packing and Fusing Narrow-Width Vector Operations for Energy-Efficient SIMD", in International Conference on High Performance Computing & Simulation (HPCS), mar, 2021. [bibtex] [pdf]
2020
[73]N. Neves, P. Tomás and N. Roma, "Dynamic Fused Multiply-Accumulate Posit Unit with Variable Exponent Size for Low-Precision DSP Applications", in IEEE Workshop on Signal Processing Systems (SiPS), oct, 2020, pp. 1-6. [bibtex] [pdf] [doi]
[72]F. Mendes, P. Tomás and N. Roma, "Exploiting Non-conventional DVFS on GPUs: Application to Deep Learning", in IEEE International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), IEEE, sep, 2020, pp. 1–9. [bibtex] [pdf] [doi]
[71]R. Porto, B. Zatt, N. Roma, L. Agostini, M. Porto, "2PSA: An Optimized and Flexible Power-Precision Scalable Adder", in Symposium on Integrated Circuits and Systems Design (SBCCI), IEEE, aug, 2020, pp. 1–6. [bibtex] [pdf] [doi]
[70]N. Neves, P. Tomás and N. Roma, "Reconfigurable Stream-based Tensor Unit with Variable-Precision Posit Arithmetic", in IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), IEEE, jul, 2020, pp. 149–156. [bibtex] [pdf] [doi]
[69]J. Vieira, N. Roma, G. Falcao, P. Tomás, "Processing Convolutional Neural Networks on Cache", in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, may, 2020, pp. 1658–1662. [bibtex] [pdf] [doi]
2019
[68]P. Sá, H. Aidos, N. Roma, P. Tomás, "Heart Disease Detection Architecture for Lead I Off-the-Person ECG Monitoring Devices", in European Signal Processing Conference (EUSIPCO), IEEE, sep, 2019, pp. 1–5. [bibtex] [pdf] [doi]
[67]R. Porto, L. Agostini, B. Zatt, N. Roma, M. Porto, "Power-Efficient Approximate SAD Architecture with LOA Imprecise Adders", in IEEE Latin American Symposium on Circuits & Systems (LASCAS), R. S. Murphy, Ed., IEEE, feb, 2019, pp. 65–68. [bibtex] [pdf] [doi]
2018
[66]J. Guerreiro, A. Ilic, N. Roma, P. Tomás, "GPGPU Power Modeling for Multi-domain Voltage-Frequency Scaling", in IEEE International Symposium on High Performance Computer Architecture (HPCA), IEEE Computer Society, feb, 2018, pp. 789–800. [bibtex] [pdf] [doi]
[65]J. Vieira, N. Roma, P. Tomás, P. Ienne, G. Falcao, "Exploiting Compute Caches for Memory Bound Vector Operations", in 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), IEEE, feb, 2018, pp. 197–200. [bibtex] [pdf] [doi]
2017
[64]R. Porto, L. Agostini, B. Zatt, M. Porto, N. Roma, L. Sousa, "Energy-efficient motion estimation with approximate arithmetic", in 19th IEEE International Workshop on Multimedia Signal Processing (MMSP), IEEE, oct, 2017, pp. 1–6. [bibtex] [pdf] [doi]
2016
[63]B. Wang, M. A. Mesa, C. C. Chi, B. Juurlink, D. d. Souza, A. Ilic, N. Roma, L. Sousa, "Efficient HEVC decoder for heterogeneous CPU with GPU systems", in 18th IEEE International Workshop on Multimedia Signal Processing (MMSP), IEEE, sep, 2016, pp. 1–6. [bibtex] [pdf] [doi]
[62]J. Guerreiro, A. Ilic, N. Roma, P. Tomás, "Performance and Power-Aware Classification for Frequency Scaling of GPGPU Applications", in International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar), F. Desprez et al., Eds., Springer, aug, 2016, pp. 134–146. [bibtex] [pdf] [doi]
[61]N. Neves, A. Mussio, F. Gonçalves, P. Tomás, N. Roma, "In-Cache Streaming: Morphable Infrastructure for Many-Core Processing Systems", in International Workshop on UnConventional High Performance Computing (UCHPC), F. Desprez et al., Eds., Springer, aug, 2016, pp. 775–787. [bibtex] [pdf] [doi]
[60]M. T. Cruz, P. Tomás and N. Roma, "Unsupervised variable-grained online phase clustering for heterogeneous/morphable processors", in International Conference on High Performance Computing & Simulation, (HPCS), IEEE, jul, 2016, pp. 858–865. [bibtex] [pdf] [doi]
[59]R. Pinheiro, N. Roma and P. Tomás, "A Cross-Core Performance Model for Heterogeneous Many-Core Architectures", in International Meeting on High Performance Computing for Computational Science (VECPAR’2016), I. Dutra et al., Eds., Springer, jun, pp. 101–111. [bibtex] [pdf] [doi]
2015
[58]D. F. d. Souza, A. Ilic, N. Roma, L. Sousa, "GPU acceleration of the HEVC decoder inter prediction module", in IEEE Global Conference on Signal and Information Processing (GlobalSIP), IEEE, dec, 2015, pp. 1245–1249. [bibtex] [pdf] [doi]
[57]S. Momcilovic, N. Roma, L. Sousa, I. Z. Milentijevic, "Run-Time Machine Learning for HEVC/H.265 Fast Partitioning Decision", in IEEE International Symposium on Multimedia (ISM), IEEE Computer Society, dec, 2015, pp. 347–350. [bibtex] [pdf] [doi]
[56]M. Rodrigues, N. Roma and P. Tomás, "Fast and Scalable Thread Migration for Multi-core Architectures", in IEEE International Conference on Embedded and Ubiquitous Computing (EUC), E. Bozorgzadeh et al., Eds., IEEE Computer Society, oct, 2015, pp. 9–16. [bibtex] [pdf] [doi]
[55]N. Neves, P. Tomás and N. Roma, "Efficient data-stream management for shared-memory many-core systems", in International Conference on Field Programmable Logic and Applications (FPL), IEEE, sep, 2015, pp. 1–8. [bibtex] [pdf] [doi]
[54]D. F. d. Souza, A. Ilic, N. Roma, L. Sousa, "HEVC in-loop filters GPU parallelization in embedded systems", in International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), D. Soudris, L. Carro, Eds., IEEE, jul, 2015, pp. 123–130. [bibtex] [pdf] [doi]
[53]D. F. d. Souza, A. Ilic, N. Roma, L. Sousa, "Towards GPU HEVC intra decoding: Seizing fine-grain parallelism", in IEEE International Conference on Multimedia and Expo (ICME), IEEE Computer Society, jun, 2015, pp. 1–6. [bibtex] [pdf] [doi]
[52]T. Dias, N. Roma and L. Sousa, "High performance IP core for HEVC quantization", in IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, may, 2015, pp. 2828–2831. [bibtex] [pdf] [doi]
[51]M. T. Cruz, P. Tomás and N. Roma, "Energy-Efficient Architecture for DP Local Sequence Alignment: Exploiting ILP and DLP", in International Conference on Bioinformatics and Biomedical Engineering (IWBBIO), F. M. O. Guzman, I. Rojas, Eds., Springer, apr, 2015, pp. 194–206. [bibtex] [pdf] [doi]
[50]J. Guerreiro, A. Ilic, N. Roma, P. Tomás, "Multi-kernel Auto-Tuning on GPUs: Performance and Energy-Aware Optimization", in Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), M. Daneshtalab et al., Eds., IEEE Computer Society, mar, 2015, pp. 438–445. [bibtex] [pdf] [doi]
2014
[49]A. Gorobets, F. Pratas, N. Roma, P. Tomás, "Stream Oriented Modular Architecture with Polymorphic Processing Engines", in IEEE International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PAD), IEEE Computer Society, oct, 2014, pp. 84–89. [bibtex] [pdf] [doi]
[48]S. Momcilovic, A. Ilic, N. Roma, L. Sousa, "Efficient Parallel Video Encoding on Heterogeneous Systems", in Network for Sustainable Ultrascale Computing Workshop (NESUS), oct, 2014. [bibtex] [pdf]
[47]S. Momcilovic, A. Ilic, N. Roma, L. Sousa, "Collaborative inter-prediction on CPU+GPU systems", in IEEE International Conference on Image Processing (ICIP), IEEE, oct, 2014, pp. 1228–1232. [bibtex] [pdf] [doi]
[46]A. Ilic, S. Momcilovic, N. Roma, L. Sousa, "FEVES: Framework for Efficient Parallel Video Encoding on Heterogeneous Systems", in International Conference on Parallel Processing (ICPP), IEEE Computer Society, sep, 2014, pp. 20–29. [bibtex] [pdf] [doi]
[45]D. F. d. Souza, N. Roma and L. Sousa, "Opencl parallelization of the HEVC de-quantization and inverse transform for heterogeneous platforms", in European Signal Processing Conference (EUSIPCO), IEEE, sep, 2014, pp. 755–759. [bibtex] [pdf]
[44]N. Sebastião, P. F. Flores and N. Roma, "Optimized ASIP architecture for compressed BWT-indexed search in bioinformatics applications", in International Conference on High Performance Computing & Simulation (HPCS), IEEE, jul, 2014, pp. 527–534. [bibtex] [pdf] [doi]
[43]D. Nogueira, P. Tomás and N. Roma, "Burrows-Wheeler Transform based indexed exact search on a multi-GPU OpenCL platform", in International Conference on High Performance Computing & Simulation (HPCS), IEEE, jul, 2014, pp. 31–38. [bibtex] [pdf] [doi]
[42]M. T. Cruz, P. Tomás and N. Roma, "Low-power vectorial VLIW architecture for maximum parallelism exploitation of dynamic programming algorithms", in International Conference on High Performance Computing & Simulation (HPCS), IEEE, jul, 2014, pp. 88–95. [bibtex] [pdf] [doi]
[41]D. F. d. Souza, N. Roma and L. Sousa, "Cooperative CPU+GPU deblocking filter parallelization for high performance HEVC video codecs", in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, may, 2014, pp. 4993–4997. [bibtex] [pdf] [doi]
[40]T. Ferreirinha, R. Nunes, A. Soares, F. Pratas, P. Tomás, N. Roma, "GPU Accelerated Stochastic Inversion of Deep Water Seismic Data", in International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar), L. M. B. Lopes et al., Eds., Springer, 2014, pp. 239–250. [bibtex] [pdf] [doi]
2013
[39]S. Paiagua, F. Pratas, P. Tomás, N. Roma, R. Chaves, "HotStream: Efficient Data Streaming of Complex Patterns to Multiple Accelerating Kernels", in International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), IEEE Computer Society, oct, 2013, pp. 17–24. [bibtex] [pdf] [doi]
[38]T. Dias, N. Roma and L. Sousa, "High performance multi-standard architecture for DCT computation in H.264/AVC High Profile and HEVC codecs", in Conference on Design and Architectures for Signal and Image Processing (DASIP), IEEE, oct, 2013, pp. 14–21. [bibtex] [pdf]
[37]J. M. Leitão, J. A. Germano, N. Roma, R. Chaves, P. Tomás, "Scalable and high throughput biosensing platform", in International Conference on Field programmable Logic and Applications (FPL), IEEE, sep, 2013, pp. 1–6. [bibtex] [pdf] [doi]
[36]J. Colaço, A. Matoga, A. Ilic, N. Roma, P. Tomás, R. Chaves, "Transparent Application Acceleration by Intelligent Scheduling of Shared Library Calls on Heterogeneous Systems", in International Conference on Parallel Processing and Applied Mathematics (PPAM), R. Wyrzykowski et al., Eds., Springer, sep, 2013, pp. 693–703. [bibtex] [pdf] [doi]
[35]A. Matoga, R. Chaves, P. Tomás, N. Roma, "A flexible shared library profiler for early estimation of performance gains in heterogeneous systems", in International Conference on High Performance Computing & Simulation (HPCS), IEEE, jul, 2013, pp. 461–470. [bibtex] [pdf] [doi]
[34]N. Neves, N. Sebastião, A. Patricio, D. M. d. Matos, P. Tomás, P. Flores, N. Roma, "BioBlaze: Multi-core SIMD ASIP for DNA sequence alignment", in International Conference on Application-Specific Systems, Architectures and Processors (ASAP), IEEE Computer Society, jun, 2013, pp. 241–244. [bibtex] [pdf] [doi]
2012
[33]N. Roma, P. Magalhães, "System-level prototyping framework for heterogeneous multi-core architecture applied to biological sequence analysis", in IEEE International Symposium on Rapid System Prototyping (RSP), IEEE, oct, 2012, pp. 156–162. [bibtex] [pdf] [doi]
[32]T. Dias, L. Rosário, N. Roma, L. Sousa, "High Performance Unified Architecture for Forward and Inverse Quantization in H.264/AVC", in Euromicro Conference on Digital System Design (DSD), IEEE Computer Society, sep, 2012, pp. 632–639. [bibtex] [pdf] [doi]
[31]S. Momcilovic, N. Roma and L. Sousa, "Multi-level Parallelization of Advanced Video Coding on Hybrid CPU+GPU Platforms", in International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar), I. Caragiannis et al., Eds., Springer, aug, 2012, pp. 165–174. [bibtex] [pdf] [doi]
[30]A. Matoga, R. Chaves, P. Tomás, N. Roma, "An FPGA based Accelerator for Encrypted File Systems", in International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES), HiPEAC, jul, 2012. [bibtex] [pdf]
2011
[29]G. Encarnação, N. Sebastião and N. Roma, "Advantages and GPU implementation of high-performance indexed DNA search based on suffix arrays", in International Conference on High Performance Computing & Simulation (HPCS), W. W. Smari, J. P. McIntire, Eds., IEEE, jul, 2011, pp. 49–55. [bibtex] [pdf] [doi]
[28]T. Dias, S. López, N. Roma, L. Sousa, "High throughput and scalable architecture for unified transform coding in embedded H.264/AVC video coding systems", in International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), L. Carro, A. D. Pimentel, Eds., IEEE, jul, 2011, pp. 225–232. [bibtex] [pdf] [doi]
2010
[27]T. Dias, N. Roma and L. Sousa, "Hardware/software co-design of H.264/AVC encoders for multi-core embedded systems", in Conference on Design & Architectures for Signal & Image Processing (DASIP), IEEE, oct, 2010, pp. 242–249. [bibtex] [pdf] [doi]
[26]T. Dias, N. Roma and L. Sousa, "H.264/AVC framework for multi-core embedded video encoders", in International Symposium on System on Chip (SoC), IEEE, sep, 2010, pp. 89–92. [bibtex] [pdf] [doi]
[25]N. Sebastião, T. Dias, N. Roma, P. Flores, "Integrated accelerator architecture for DNA sequences alignment with enhanced traceback phase", in International Conference on High Performance Computing & Simulation (HPCS), W. W. Smari, J. P. McIntire, Eds., IEEE, jun, 2010, pp. 16–23. [bibtex] [pdf] [doi]
[24]A. Rodrigues, N. Roma and L. Sousa, "p264: open platform for designing parallel H.264/AVC video encoders on multi-core systems", in International Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV), D. C. A. Bulterman, M. Hefeeda, Eds., ACM, jun, 2010, pp. 81–86. [bibtex] [pdf] [doi]
[23]T. Almeida, N. Roma, "A Parallel Programming Framework for Multi-core DNA Sequence Alignment", in International Conference on Complex, Intelligent and Software Intensive Systems (CISIS), L. Barolli et al., Eds., IEEE Computer Society, 2010, pp. 907–912. [bibtex] [pdf] [doi]
2009
[22]G. Passos, N. Roma, B. A. d. Costa, L. Sousa, J. M. Lemos, "Distributed Software Platform for Automation and Control of General Anaesthesia", in International Symposium on Parallel and Distributed Computing (ISPDC), L. Sousa, Y. Robert, Eds., IEEE Computer Society, jun, 2009, pp. 135–142. [bibtex] [pdf] [doi]
2008
[21]N. Sebastião, T. Dias, N. Roma, P. Flores, L. Sousa, "Application Specific Programmable IP Core for Motion Estimation: Technology Comparison Targeting Efficient Embedded Co-Processing Units", in Euromicro Conference on Digital System Design: Architectures, Methods and Tools (DSD), L. Fanucci, Ed., IEEE Computer Society, sep, 2008, pp. 181–188. [bibtex] [pdf] [doi]
[20]N. Sebastião, T. Dias, N. Roma, P. Flores, L. Sousa, "Specialized Motion Estimation Processor for Heterogeneous Multicore Video Coding Systems", in International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems (ACACES), HiPEAC, jul, 2008. [bibtex] [pdf]
2007
[19]N. Roma, L. Sousa, "Fully compressed-domain transcoder for PIP/PAP video composition", in Picture Coding Symposium (PCS), nov, 2007. [bibtex] [pdf]
[18]S. Momcilovic, N. Roma and L. Sousa, "Adaptive Motion Estimation Algorithm for H.264/AVC", in International Conference on Digital Signal Processing (DSP), Cardiff – U.K.: IEEE, jul, 2007. [bibtex] [pdf]
[17]S. Momcilovic, N. Roma and L. Sousa, "An ASIP Approach For Adaptive Motion Estimation on AVC", in Conference on PhD Research in Microelectronics and Electronics (PRIME), Bordeaux – France: IEEE, jul, 2007, pp. 165–168. [bibtex] [pdf]
2006
[16]T. Dias, N. Roma and L. Sousa, "Low Power Distance Measurement Unit for Real-Time Hardware Motion Estimators", in International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), J. Vounckx, N. Azémard, P. Maurine, Eds., Springer, sep, 2006, pp. 247–255. [bibtex] [pdf] [doi]
[15]S. Momcilovic, T. Dias, N. Roma, L. Sousa, "Application Specific Instruction Set Processor for Adaptive Video Motion Estimation", in Euromicro Conference on Digital System Design: Architectures, Methods and Tools (DSD), IEEE Computer Society, sep, 2006, pp. 160–167. [bibtex] [pdf] [doi]
2005
[14]T. Dias, N. Roma and L. Sousa, "Efficient Motion Vector Refinement Architecture for Sub-Pixel Motion Estimation Systems", in IEEE Workshop on Signal Processing Systems (SiPS), Athens – Greece: IEEE, nov, 2005, pp. 313–318. [bibtex] [pdf] [doi]
[13]N. Roma, L. Sousa, "Least squares motion estimation algorithm in the compressed DCT domain for H.26x/MPEG-x video sequences", in IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), IEEE Computer Society, sep, 2005, pp. 576–581. [bibtex] [pdf] [doi]
[12]T. Dias, N. Roma and L. Sousa, "Efficient VLSI Architecture for Real-Time Motion Estimation in Advanced Video Coding", in IEEE International SOC Conference (SOCC), IEEE, sep, 2005, pp. 91–92. [bibtex] [pdf] [doi]
2003
[11]N. Roma, T. Dias and L. Sousa, "Fast Adder Architectures: Modeling and Experimental Evaluation", in XVIII Conference on Design of Circuits and Integrated Systems (DCIS), nov, 2003, pp. 367–372. [bibtex] [pdf]
[10]N. Roma, T. Dias and L. Sousa, "Customisable Core-Based Architectures for Real-Time Motion Estimation on FPGAs", in International Conference on Field Programmable Logic and Applications (FPL), sep, 2003, pp. 745-754. [bibtex] [pdf]
2002
[9]N. Roma, L. Sousa, "Insertion of Irregular-Shaped Logos in the Compressed DCT Domain", in IEEE International Conference on Digital Signal Processing (DSP), jul, 2002, pp. 125–128. [bibtex] [pdf] [doi]
2001
[8]N. Roma, L. Sousa, "A New VLSI Architecture for Full Search Block Matching", in IFIP International Conference on Very Large Scale Integration (VLSI-SoC), dec, 2001, pp. 213–218. [bibtex] [pdf] [doi]
[7]N. Roma, L. Sousa, "Parameterizable Hardware Architectures for Automatic Synthesis of Motion Estimation Processors", in IEEE Workshop on Signal Processing Systems (SiPS), sep, 2001, pp. 428–439. [bibtex] [pdf] [doi]
2000
[6]N. Roma, L. Sousa, "In the Development and Evaluation of Specialized Processors for Computing High-Order 2-D Image Moments in Real-Time", in International Workshop on Computer Architectures for Machine Perception (CAMP), IEEE Computer Society, sep, 2000, pp. 170. [bibtex] [pdf] [doi]
[5]N. Roma, J. Santos-Victor and J. Tomé, "A Comparative Analysis of Cross-Correlation Matching Algorithms Using a Pyramidal Resolution Approach", in Workshop on Empirical Evaluation Methods in Computer Vision, in conjunction with the European Conference on Computer Vision (ECCV), jun, 2000. [bibtex] [pdf]
1999
[4]C. Coelho, N. Roma and L. Sousa, "Pipeline Architectures for Computing 2-D Image Moments", in XIV Conference on Design of Circuits and Integrated Systems (DCIS), nov, 1999, pp. 169–174. [bibtex] [pdf]
[3]M. Ortigueira, N. Roma, C. Martins, M. Piedade, "An Archetypal Based ECG Analysis System", in III Congreso de Usuarios de MATLAB (MATLAB), nov, 1999. [bibtex] [pdf]
[2]L. Sousa, N. Roma, "Low-power array architectures for motion estimation", in IEEE Workshop on Multimedia Signal Processing (MMSP), K. J. R. Liu et al., Eds., IEEE, 1999, pp. 679–684. [bibtex] [pdf] [doi]
1998
[1]A. Abreu, N. Roma, L. Sousa, J. Gerald, "Digital Video Transmission through the Electrical Power Lines", in Second European DSP Education and Research Conference, sep, 1998. [bibtex] [pdf]
Theses
2008
[3]N. Roma, "Transform Domain Transcoding Systems for Static and Dynamic Video Composition", Ph.D. dissertation, Instituto Superior Técnico, Universidade Técnica de Lisboa, 2008. [bibtex] [pdf]
2001
[2]N. Roma, "Processadores Dedicados para Estimação de Movimento em Sequências de Vídeo", Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, 2001. [bibtex] [pdf]
1998
[1]A. Abreu, N. Roma, "Sistema de Transmissão de Vídeo através da Rede Eléctrica", Master’s thesis, Instituto Superior Técnico, Universidade Técnica de Lisboa, 1998. [bibtex] [pdf]